EN conditionLBA conditionThis files is a summary of the work made by GMM5 students on data coming from ENVT study.
The data are processed using the following packages:
library(mixOmics)
library(metagenomeSeq)
library(reshape2)
Data are given in two files included in the directory data :
abondances.csv that contains microbiote data:df_abundances <- read.delim("../data/abondances.csv", sep = ",",
stringsAsFactors = FALSE)
summary(df_abundances[ ,1:15])
## X.blast_taxonomy blast_subject blast_perc_identity
## Length:406 Length:406 Min. : 94.18
## Class :character Class :character 1st Qu.: 99.39
## Mode :character Mode :character Median : 99.80
## Mean : 99.32
## 3rd Qu.:100.00
## Max. :100.00
## blast_perc_query_coverage blast_evalue blast_aln_length
## Min. : 98.13 Min. :0 Min. :430.0
## 1st Qu.:100.00 1st Qu.:0 1st Qu.:466.2
## Median :100.00 Median :0 Median :480.0
## Mean : 99.96 Mean :0 Mean :477.4
## 3rd Qu.:100.00 3rd Qu.:0 3rd Qu.:490.0
## Max. :100.00 Max. :0 Max. :512.0
## seed_id observation_name observation_sum
## Length:406 Length:406 Min. : 224.0
## Class :character Class :character 1st Qu.: 588.5
## Mode :character Mode :character Median : 979.5
## Mean : 9753.7
## 3rd Qu.: 2738.0
## Max. :880523.0
## NG.10214_EN10A_lib136338_4869_1 NG.10214_EN10B_lib136339_4869_1
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0
## Mean : 108.4 Mean : 108.4
## 3rd Qu.: 12.0 3rd Qu.: 12.0
## Max. :17044.0 Max. :17663.0
## NG.10214_EN11A_lib136340_4869_1 NG.10214_EN11B_lib136341_4869_1
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 1.0 Median : 1.0
## Mean : 108.4 Mean : 108.4
## 3rd Qu.: 6.0 3rd Qu.: 6.0
## Max. :24223.0 Max. :23739.0
## NG.10214_EN15A_lib136342_4869_1 NG.10214_EN15B_lib136343_4869_1
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 3.00 Median : 4.00
## Mean : 108.37 Mean : 108.37
## 3rd Qu.: 15.75 3rd Qu.: 15.75
## Max. :28421.00 Max. :27946.00
The first 9 columns contain information about the different bacteria identified within samples. The following columns contain the two technical replicates (identified by “A” and “B”) of all the farms involved in the study (the farm is identified by a number preceeding the letter “A”/“B”).
Note that some “blast taxonomy” are duplicated:
unique(names(which(table(df_abundances[ ,1]) > 1)))
## [1] "Bacteria;Actinobacteria;Actinobacteria;Corynebacteriales;Corynebacteriaceae;Corynebacterium 1;Corynebacterium sp."
## [2] "Bacteria;Actinobacteria;Actinobacteria;Corynebacteriales;Corynebacteriaceae;Corynebacterium 1;unknown species"
## [3] "Bacteria;Actinobacteria;Actinobacteria;Corynebacteriales;Corynebacteriaceae;Corynebacterium;unknown species"
## [4] "Bacteria;Actinobacteria;Actinobacteria;Corynebacteriales;Dietziaceae;Dietzia;Dietzia sp."
## [5] "Bacteria;Actinobacteria;Actinobacteria;Corynebacteriales;Nocardiaceae;Rhodococcus;Rhodococcus sp."
## [6] "Bacteria;Actinobacteria;Actinobacteria;Micrococcales;Brevibacteriaceae;Brevibacterium;Brevibacterium antiquum"
## [7] "Bacteria;Actinobacteria;Actinobacteria;Micrococcales;Brevibacteriaceae;Brevibacterium;unknown species"
## [8] "Bacteria;Actinobacteria;Actinobacteria;Micrococcales;Intrasporangiaceae;Janibacter;unknown species"
## [9] "Bacteria;Actinobacteria;Actinobacteria;Micrococcales;Microbacteriaceae;Leucobacter;unknown species"
## [10] "Bacteria;Actinobacteria;Actinobacteria;Micrococcales;Micrococcaceae;Arthrobacter;Arthrobacter sp."
## [11] "Bacteria;Actinobacteria;Actinobacteria;Streptomycetales;Streptomycetaceae;Streptomyces;Streptomyces sp."
## [12] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides;unknown species"
## [13] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Petrimonas;unknown species"
## [14] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;unknown species"
## [15] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Proteiniphilum;unknown species"
## [16] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;unknown species"
## [17] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella 1;unknown species"
## [18] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella 2;unknown species"
## [19] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella 9;unknown species"
## [20] "Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Rikenellaceae;Rikenellaceae RC9 gut group;unknown species"
## [21] "Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Bergeyella;unknown species"
## [22] "Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Flavobacterium;unknown species"
## [23] "Bacteria;Bacteroidetes;Sphingobacteriia;Sphingobacteriales;Sphingobacteriaceae;Olivibacter;unknown species"
## [24] "Bacteria;Bacteroidetes;Sphingobacteriia;Sphingobacteriales;Sphingobacteriaceae;Pedobacter;unknown species"
## [25] "Bacteria;Firmicutes;Bacilli;Bacillales;Planococcaceae;Caryophanon;Caryophanon latum"
## [26] "Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Jeotgalicoccus;unknown species"
## [27] "Bacteria;Firmicutes;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;Staphylococcus sp."
## [28] "Bacteria;Firmicutes;Bacilli;Lactobacillales;Aerococcaceae;Facklamia;unknown species"
## [29] "Bacteria;Firmicutes;Bacilli;Lactobacillales;Aerococcaceae;unknown genus;unknown species"
## [30] "Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;Trichococcus;unknown species"
## [31] "Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;unknown genus;unknown species"
## [32] "Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;unknown species"
## [33] "Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus pluranimalium"
## [34] "Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;unknown species"
## [35] "Bacteria;Firmicutes;Clostridia;Clostridiales;Clostridiaceae 1;Clostridium sensu stricto 1;unknown species"
## [36] "Bacteria;Firmicutes;Clostridia;Clostridiales;Family XI;Peptoniphilus;unknown species"
## [37] "Bacteria;Firmicutes;Clostridia;Clostridiales;Lachnospiraceae;Blautia;unknown species"
## [38] "Bacteria;Firmicutes;Clostridia;Clostridiales;Lachnospiraceae;Lachnoclostridium;unknown species"
## [39] "Bacteria;Firmicutes;Clostridia;Clostridiales;Lachnospiraceae;Lachnospiraceae NK3A20 group;unknown species"
## [40] "Bacteria;Firmicutes;Clostridia;Clostridiales;Peptostreptococcaceae;Intestinibacter;unknown species"
## [41] "Bacteria;Firmicutes;Clostridia;Clostridiales;Peptostreptococcaceae;Peptoclostridium;unknown species"
## [42] "Bacteria;Firmicutes;Clostridia;Clostridiales;Ruminococcaceae;Faecalibacterium;unknown species"
## [43] "Bacteria;Firmicutes;Clostridia;Clostridiales;Ruminococcaceae;Ruminococcaceae UCG-005;unknown species"
## [44] "Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Erysipelothrix;unknown species"
## [45] "Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;unknown genus;unknown species"
## [46] "Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Hyphomicrobiaceae;Devosia;unknown species"
## [47] "Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Methylobacteriaceae;Methylobacterium;Methylobacterium sp."
## [48] "Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Phyllobacteriaceae;Aquamicrobium;unknown species"
## [49] "Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Rhizobiaceae;Rhizobium;Agrobacterium tumefaciens"
## [50] "Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Alcaligenaceae;Alcaligenes;Alcaligenes faecalis"
## [51] "Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Comamonas;unknown species"
## [52] "Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Oxalobacteraceae;Massilia;unknown species"
## [53] "Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Escherichia-Shigella;Escherichia coli"
## [54] "Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Mannheimia;Mannheimia sp."
## [55] "Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Moraxella;Moraxella oblonga"
## [56] "Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Moraxella;unknown species"
## [57] "Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonas sp."
## [58] "Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae;Luteimonas;unknown species"
## [59] "Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae;Stenotrophomonas;Stenotrophomonas sp."
## [60] "Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae;Stenotrophomonas;unknown species"
## [61] "Bacteria;Tenericutes;Mollicutes;Mycoplasmatales;Mycoplasmataceae;Mycoplasma;Mycoplasma bovirhinis"
as well as some of the bast id themselves
unique(names(which(table(df_abundances[ ,2]) > 1)))
## [1] "AB680687.1.1434" "AB681778.1.1454" "AF224286.1.1534"
## [4] "AJ491302.1.1509" "AM183097.1.1441" "AY243344.1.1501"
## [7] "AY725811.1.1474" "FJ672272.1.1408" "FJ674571.1.1375"
## [10] "FJ674662.1.1457" "GQ358834.1.1513" "GQ358846.1.1455"
## [13] "HM325988.1.1341" "HM327870.1.1341" "JX986968.1.1511"
## [16] "KC894531.1.1498"
First, the two technical replicates are merged (simple sums as the counts have already been normalized to identical library sizes)
abundances <- df_abundances[ ,grep("A_", colnames(df_abundances))] +
df_abundances[ ,grep("B_", colnames(df_abundances))]
dim(abundances)
## [1] 406 45
which leads to 45 samples (columns) in which 406 bacteria have been observed (rows).
Condition (“EN” or “LBA”) is extracted from column names:
condition <- rep("LBA", ncol(abundances))
condition[grep("EN", colnames(abundances))] <- "EN"
condition <- factor(condition)
table(condition)
## condition
## EN LBA
## 22 23
Also, the farm identifier is extracted from column names:
id_abundances <- as.character(colnames(abundances))
id_abundances <- sapply(id_abundances, function(ac)
substr(ac, nchar(ac) - 19, nchar(ac) - 18))
id_abundances <- gsub("A", "0", id_abundances)
id_abundances <- gsub("N", "0", id_abundances)
table(id_abundances)
## id_abundances
## 01 03 04 06 07 09 10 11 15 16 17 18 19 20 21 23 24 25 26 28 29 30 31
## 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2
All but one farm (idenfier: 29) have been sampled twice, once for each condition.
Duplicated species (based on identical taxonomies) are then merged (counts are summed). To do so, first simplified species names are obtained (last unknown species for every row):
species <- sapply(df_abundances[ ,1], function(aname)
unlist(strsplit(aname, ";")))
species <- sapply(species, function(avect) {
find_unknown <- grep("unknown", avect)
if (length(find_unknown) > 0) {
return(avect[-find_unknown])
} else return(avect)
})
species <- unlist(sapply(species, function(avect) avect[length(avect)]))
species <- unname(species)
Then, counts corresponding to the same simplified name are summed:
abundances <- apply(abundances, 2, function(acol) tapply(acol, species, sum))
and the species names are retrieved:
species <- rownames(abundances)
One species has an unexpected name:
head(species)
## [1] "&"
## [2] "Acetobacteraceae bacterium SAP1007.2"
## [3] "Acholeplasma laidlawii PG-8A"
## [4] "Achromobacter sp."
## [5] "Achromobacter spanius"
## [6] "Actinoalloteichus cyanogriseus"
that corresponds to the row number 73 in the original data (checked also directly in the file sent by Elias on 10/09).
The effect of different normalization is first explored by analyzing the distributions of the counts in the first sample before and after normalization. Distribution before normalization is provided as:
df <- data.frame(abundances)
names(df) <- paste0("sample", 1:ncol(df))
ggplot(df, aes(x = sample1 +1)) + geom_histogram(bins = 50) + scale_y_log10() +
theme_bw() + xlab("counts (sample 1)") + ggtitle("Sample 1 distribution in raw data")
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 41 rows containing missing values (geom_bar).
and with a log-transformation by
ggplot(df, aes(x = sample1 + 1)) + geom_histogram(bins = 50) +
theme_bw() + xlab("counts (sample 1)") + scale_x_log10() +
ggtitle("Sample 1 distribution (log scale)")
It is commun in metagenomic datasets to perform TSS (Total Sum Scaling) before further normalization. TSS transformation computes relative abundances: \[ y_{ij} = \frac{n_{ij}}{\sum_{k=1}^p n_{ik}} \] for \(n_{ij}\) the counts of species \(j\) in sample \(i\), \(p\) the number of species and \(n\) the number of individuals.
abundances_TSS <- apply(abundances, 1, function(asample) asample / sum(asample))
df <- as.data.frame(t(abundances_TSS))
names(df) <- paste0("sample", 1:ncol(df))
dim(df)
## [1] 270 45
ggplot(df, aes(x = sample1 + 1)) + geom_histogram(bins = 50) +
theme_bw() + xlab("relative abundance (sample 1)") + scale_x_log10() + ggtitle("Sample 1 distribution (TSS)")
The next two histograms are based on the normalized counts with:
abundances_CLR <- logratio.transfo(abundances_TSS, logratio = "CLR",
offset = 1)
class(abundances_CLR) <- "matrix"
df <- data.frame(t(abundances_CLR))
names(df) <- paste0("sample", 1:ncol(df))
dim(df)
## [1] 270 45
ggplot(df, aes(x = sample1)) + geom_histogram(bins = 50) + theme_bw() +
xlab("counts (sample 1)") + ggtitle("Sample 1 distribution (CLR)")
abundances_ILR <- logratio.transfo(abundances_TSS, logratio = "ILR",
offset = 1)
class(abundances_ILR) <- "matrix"
df <- data.frame(t(abundances_ILR))
names(df) <- paste0("sample", 1:ncol(df))
dim(df)
## [1] 269 45
ggplot(df, aes(x = sample1)) + geom_histogram(bins = 50) + theme_bw() +
xlab("counts (sample 1)") + ggtitle("Sample 1 distribution (ILR)")
abundances_CSS <- newMRexperiment(abundances)
abundances_CSS <- cumNorm(abundances_CSS)
## Default value being used.
df <- data.frame(MRcounts(abundances_CSS))
names(df) <- paste0("sample", 1:ncol(df))
dim(df)
## [1] 270 45
ggplot(df, aes(x = sample1 + 1)) + geom_histogram(bins = 50) + theme_bw() +
xlab("counts (sample 1)") + scale_y_log10() + scale_x_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 16 rows containing missing values (geom_bar).
The less asymetric distribution seems to be the one obtained with the CLR transformation and the log-transformed CSS.
Distributions of all samples according to the type of transformation and the sample is provided below:
df_log <- log10(abundances + 1)
df_log <- data.frame(df_log)
rownames(df_log) <- NULL
names(df_log) <- paste0("Sample", 1:ncol(df_log))
df_log <- melt(df_log)
df_CLR <- data.frame(t(abundances_CLR))
names(df_CLR) <- paste0("Sample", 1:ncol(df_CLR))
df_CLR <- melt(df_CLR)
df_ILR <- data.frame(t(abundances_ILR))
names(df_ILR) <- paste0("Sample", 1:ncol(df_ILR))
df_ILR <- melt(df_ILR)
df_CSS <- data.frame(log(MRcounts(abundances_CSS)) + 1)
names(df_CSS) <- paste0("Sample", 1:ncol(df_CSS))
df_CSS <- melt(df_CSS)
all_sizes <- c(nrow(df_log), nrow(df_CLR), nrow(df_ILR), nrow(df_CSS))
df <- data.frame(rbind(df_log, df_CLR, df_ILR, df_CSS),
"type" = rep(c("log", "CLR", "ILR", "log-CSS"), all_sizes))
ggplot(df, aes(x = variable, y = value)) + geom_boxplot() + theme_bw() +
facet_wrap(~ type, scales = "free_y") + xlab("samples") +
theme(axis.text.x = element_blank())
## Warning: Removed 6381 rows containing non-finite values (stat_boxplot).
Log
A first exploratory analysis is performed with PCA on (merged) raw counts with log transformation:
pca_raw <- pca(log(t(abundances) + 1), ncomp = ncol(abundances),
logratio = 'none')
plot(pca_raw)
that shows a good percentage of explained variance for the first axis.
Projection of the individuals on the first two PCs also shows a good separation between the two conditions:
plotIndiv(pca_raw,
comp = c(1,2),
pch = 16,
ind.names = FALSE,
group = condition,
col.per.group = color.mixo(1:2),
legend = TRUE,
title="PCA for log-tranformed data")
TSS+CLR
The same analysis is used with TSS normalized counts subsequently transformed by CLR or ILR (which is the expected analysis):
pca_CLR <- pca(abundances_TSS + 1, ncomp = nrow(abundances_TSS),
logratio = 'CLR')
plot(pca_CLR)
plotIndiv(pca_CLR,
comp = c(1,2),
pch = 16,
ind.names = FALSE,
group = condition,
col.per.group = color.mixo(1:2),
legend = TRUE,
title="PCA for CLR-tranformed data")
TSS+ILR
pca_ILR <- pca(abundances_TSS + 1, ncomp = nrow(abundances_TSS) - 1,
logratio = 'ILR')
plot(pca_ILR)
plotIndiv(pca_ILR,
comp = c(1,2),
pch = 16,
ind.names = FALSE,
group = condition,
col.per.group = color.mixo(1:2),
legend = TRUE,
title="PCA for ILR-tranformed data")
CSS
log_CSS <- log(MRcounts(abundances_CSS) + 1)
pca_CSS <- pca(t(log_CSS), ncomp = ncol(log_CSS))
plot(pca_CSS)
plotIndiv(pca_CSS,
comp = c(1,2),
pch = 16,
ind.names = FALSE,
group = condition,
col.per.group = color.mixo(1:2),
legend = TRUE,
title="PCA for CSS-tranformed data")
A first PLS-DA is computed (with 10-fold CV) to check the efficiency of the method and which type of distance to use in its computation.
set.seed(11)
res_plsda <- plsda(log(t(abundances)+1), condition, ncomp = nlevels(condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA Comp 1 - 2')
PLS-DA shows a good separation between the two groups and indicates that the Mahalanobis distance provides the lower overall classification error.
Then, sparse PLS-DA is used (with the multilevel approach) to check which number of components to select.
Creation of data set log transformed and without the calf 29:
clean_log <- data.frame(log(t(abundances[ ,id_abundances != "29"]) + 1))
names(clean_log) <- species
clean_condition <- factor(condition[id_abundances != "29"])
clean_id <- factor(id_abundances[id_abundances != "29"])
set.seed(33)
res_plsda <- tune.splsda(clean_log, clean_condition,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold', folds = 10,
dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = FALSE)
## Splitting the variation for 1 level factor.
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 5 14
Finally sparse PLS-DA is performed and the variables explaining the two types of samples are obtained:
res_splsda <- splsda(clean_log, clean_condition,
ncomp = nlevels(clean_condition), multilevel = clean_id,
keepX = sel_keepX)
## Splitting the variation for 1 level factor.
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA Comp 1 - 2')
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Mycoplasma bovoculi M165/69" "Corynebacterium camporealensis"
## [3] "Moraxella" "Aquamicrobium"
## [5] "Peptoclostridium"
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
# cim(res_splsda,comp=1,row.names = paste(clean_id,clean_condition,sep='_'),title="Heatmap with splsda values")
# test<-clean_log[,which(species%in%selectVar(res_splsda, comp = 1)$name)]
# cim(sapply(test,as.numeric),row.names = paste(clean_id,clean_condition,sep='_'),title="Heatmap with clean_log values")
EN vs LBA: Student test on selected Variables
cat("Selected variables:",paste(selectVar(res_splsda, comp = 1)$name,collapse=', '))
## Selected variables: Mycoplasma bovoculi M165/69, Corynebacterium camporealensis, Moraxella, Aquamicrobium, Peptoclostridium
testEN <- clean_log[clean_condition=="EN",grep(selectVar(res_splsda, comp = 1)$name[1],species)]
testLBA <- clean_log[clean_condition=="LBA",grep(selectVar(res_splsda, comp = 1)$name[1],species)]
t.test(testEN,testLBA,paired=TRUE)
##
## Paired t-test
##
## data: testEN and testLBA
## t = 14.059, df = 21, p-value = 3.728e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.717008 6.354767
## sample estimates:
## mean of the differences
## 5.535887
We reject \(H_0\) (the difference in means is equal to 0) hence we can say that there is a difference between the two types of samples (EN and LBA) for the Mycoplasma bovoculi M165/69 bateria.
boxplot(testEN,testLBA,names=c("EN","LBA"),
main=selectVar(res_splsda, comp = 1)$name[1],col=c("dodgerblue2","darkorange"))
testEN <- clean_log[clean_condition=="EN",grep(selectVar(res_splsda, comp = 1)$name[2],species)]
testLBA <- clean_log[clean_condition=="LBA",grep(selectVar(res_splsda, comp = 1)$name[2],species)]
t.test(testEN,testLBA,paired=TRUE)
##
## Paired t-test
##
## data: testEN and testLBA
## t = 13.686, df = 21, p-value = 6.214e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.967682 4.031138
## sample estimates:
## mean of the differences
## 3.49941
We reject \(H_0\) (the difference in means is equal to 0) hence we can say that there is a difference between the two types of samples (EN and LBA) for the Corynebacterium camporealensis bateria.
boxplot(testEN,testLBA,names=c("EN","LBA"),
main=selectVar(res_splsda, comp = 1)$name[2],col=c("dodgerblue2","darkorange"))
testEN <- clean_log[clean_condition=="EN",which(selectVar(res_splsda, comp = 1)$name[3]==species)]
testLBA <- clean_log[clean_condition=="LBA",which(selectVar(res_splsda, comp = 1)$name[3]==species)]
t.test(testEN,testLBA,paired=TRUE)
##
## Paired t-test
##
## data: testEN and testLBA
## t = 12.748, df = 21, p-value = 2.368e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.617779 6.418044
## sample estimates:
## mean of the differences
## 5.517911
We reject \(H_0\) (the difference in means is equal to 0) hence we can say that there is a difference between the two types of samples (EN and LBA) for the Moraxella bateria.
boxplot(testEN,testLBA,names=c("EN","LBA"),
main=selectVar(res_splsda, comp = 1)$name[3],col=c("dodgerblue2","darkorange"))
testEN <- clean_log[clean_condition=="EN",which(selectVar(res_splsda, comp = 1)$name[4]==species)]
testLBA <- clean_log[clean_condition=="LBA",which(selectVar(res_splsda, comp = 1)$name[4]==species)]
t.test(testEN,testLBA,paired=TRUE)
##
## Paired t-test
##
## data: testEN and testLBA
## t = 12.382, df = 21, p-value = 4.077e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.257027 4.571954
## sample estimates:
## mean of the differences
## 3.914491
We reject \(H_0\) (the difference in means is equal to 0) hence we can say that there is a difference between the two types of samples (EN and LBA) for the Aquamicrobium bateria.
boxplot(testEN,testLBA,names=c("EN","LBA"),
main=selectVar(res_splsda, comp = 1)$name[4],col=c("dodgerblue2","darkorange"))
testEN <- clean_log[clean_condition=="EN",which(selectVar(res_splsda, comp = 1)$name[5]==species)]
testLBA <- clean_log[clean_condition=="LBA",which(selectVar(res_splsda, comp = 1)$name[5]==species)]
t.test(testEN,testLBA,paired=TRUE)
##
## Paired t-test
##
## data: testEN and testLBA
## t = 12.319, df = 21, p-value = 4.484e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.266202 4.592981
## sample estimates:
## mean of the differences
## 3.929592
We reject \(H_0\) (the difference in means is equal to 0) hence we can say that there is a difference between the two types of samples (EN and LBA) for the Peptoclostridium bateria.
boxplot(testEN,testLBA,names=c("EN","LBA"),
main=selectVar(res_splsda, comp = 1)$name[5],col=c("dodgerblue2","darkorange"))
abundances_CLR <- data.frame(abundances_CLR)
names(abundances_CLR) <- species
set.seed(11)
res_plsda <- plsda(abundances_CLR, condition, ncomp = nlevels(condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (CLR)')
clean_CLR <- data.frame(abundances_CLR[id_abundances != "29", ])
names(clean_CLR) <- species
set.seed(33)
res_plsda <- tune.splsda(clean_CLR, clean_condition,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = FALSE)
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
res_splsda <- splsda(clean_CLR, clean_condition,
ncomp = nlevels(clean_condition), multilevel = clean_id,
keepX = sel_keepX)
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (CLR)')
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max')
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max')
abundances_CSS <- data.frame(t(log_CSS))
names(abundances_CSS) <- species
set.seed(11)
res_plsda <- plsda(abundances_CSS, condition, ncomp = nlevels(condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (CSS)')
clean_CSS <- data.frame(abundances_CSS[id_abundances != "29", ])
names(clean_CSS) <- species
set.seed(33)
res_plsda <- tune.splsda(clean_CSS, clean_condition,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = FALSE)
## Splitting the variation for 1 level factor.
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 5 14
res_splsda <- splsda(clean_CSS, clean_condition,
ncomp = nlevels(clean_condition), multilevel = clean_id,
keepX = sel_keepX)
## Splitting the variation for 1 level factor.
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (CSS)')
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max')
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max')
set.seed(33)
res_plsda <- tune.splsda(log(t(abundances)+1), condition,
ncomp = nlevels(condition),
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = FALSE)
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 20 15
res_splsda <- splsda(log(t(abundances)+1), condition,
ncomp = nlevels(condition), keepX = sel_keepX)
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (log)')
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
set.seed(11)
res_plsda <- plsda(abundances_CLR, condition, ncomp = nlevels(condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (CLR)')
set.seed(33)
res_plsda <- tune.splsda(abundances_CLR, condition, ncomp = nlevels(condition),
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
res_splsda <- splsda(abundances_CLR, condition, ncomp = nlevels(condition),
keepX = sel_keepX)
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (CLR)')
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
In this section will be performed Random Forest algorithm for classification, with a virus as response vector and all bacteria as explanatory variables in the model.
library(randomForest,quietly=T)
Data preparation:
df_pathogenes<-read.delim("../data/pathogenes.csv",sep = ",")
pathogenes<-df_pathogenes[,-c(1,9)]
pathogenes<-as.data.frame(ifelse(pathogenes=='p',1,0))
pathogenes<-as.data.frame(apply(t(pathogenes),1,as.factor))
id_pathogenes<-df_pathogenes[,1]
id_pathogenes <- gsub("-","0",id_pathogenes)
id_pathogenes <- gsub(" ","",id_pathogenes)
id_pathogenes <- sapply(as.character(id_pathogenes),FUN=function(x){substr(x,nchar(x)-1,nchar(x))})
id_pathogenes <- id_pathogenes[id_pathogenes!="29"]
id_pathogenes <- paste0(id_pathogenes,"_",rev(condition[id_abundances!="29"]))
The list of calf names is not in the same order in the pathogenes file as in the abundances file. The abundances file order will be kept:
id_abundances_bis <- paste0(id_abundances[-grep("29",id_abundances)],"_",
condition[-grep("29",id_abundances)])
pathogenes <- pathogenes[df_pathogenes[,1]!="ICSA-29",]
Y<-pathogenes[match(id_abundances_bis,id_pathogenes),]
rownames(Y)<-id_abundances_bis
Y contains the seven response vectors.
cat(" 0 1\n\n")
## 0 1
for (virus in colnames(Y)){
cat(virus,"\n",table(Y[,virus]),fill=T)
}
## Ct.RSV
## 37 7
## Ct.PI.3
## 41 3
## Ct.Coronavirus
## 22 22
## Ct.P.multocida
## 6 38
## Ct.M.haemolytica
## 31 13
## Ct.M.bovis
## 39 5
## Ct.H.somni
## 26 18
The bacteria names are too long for the plots, they will be renamed:
X <- clean_log
X <- as.data.frame(sapply(X,as.integer))
colnames(X)<- make.names(names(X))
RSV:fit <- randomForest(Y$Ct.RSV ~ ., data = X,classwt=table(Y$Ct.RSV),ntree=3000)
fit$confusion
## 0 1 class.error
## 0 36 1 0.02702703
## 1 7 0 1.00000000
varImpPlot(fit)
round(fit$importance[order(fit$importance[, 1], decreasing = TRUE), ][1:5],3)
plot(fit$err.rate[, 1], type = "l", xlab = "number of trees", ylab = "error OOB")
cat(fit$predicted)
PI.3:fit <- randomForest(Y$Ct.PI.3 ~ ., data = X,classwt=table(Y$Ct.PI.3),ntree=1000,mtry=10)
fit$confusion
## 0 1 class.error
## 0 41 0 0
## 1 3 0 1
Nothing in class 1…
varImpPlot(fit)
round(fit$importance[order(fit$importance[, 1], decreasing = TRUE), ][1:5],3)
plot(fit$err.rate[, 1], type = "l", xlab = "number of trees", ylab = "error OOB")
P.multocida:fit <- randomForest(Y$Ct.P.multocida ~ ., data = X,classwt=table(Y$Ct.P.multocida),ntree=1000,mtry=16)
fit$confusion
## 0 1 class.error
## 0 0 6 1
## 1 0 38 0
Nothing in class 0
varImpPlot(fit)
fit$importance[order(fit$importance[, 1], decreasing = TRUE), ][1:5]
plot(fit$err.rate[, 1], type = "l", xlab = "nombre d'arbres", ylab = "erreur OOB")
M.haemolytica:fit <- randomForest(Y$Ct.M.haemolytica ~ ., data = X,classwt=table(Y$Ct.M.haemolytica),ntree=1000,mtry=40)
fit$confusion
## 0 1 class.error
## 0 30 1 0.03225806
## 1 13 0 1.00000000
varImpPlot(fit)
fit$importance[order(fit$importance[, 1], decreasing = TRUE), ][1:5]
plot(fit$err.rate[, 1], type = "l", xlab = "number of trees", ylab = "error OOB")
M.bovis:fit <- randomForest(Y$Ct.M.bovis ~ ., data = X,classwt=table(Y$Ct.M.bovis) , ntree=500,mtry=40)
fit$confusion
## 0 1 class.error
## 0 39 0 0
## 1 5 0 1
varImpPlot(fit)
fit$importance[order(fit$importance[, 1], decreasing = TRUE), ][1:5]
plot(fit$err.rate[, 1], type = "l", xlab = "number of trees", ylab = "error OOB")
H.somni:fit <- randomForest(Y$Ct.H.somni ~ ., data = X,classwt=table(Y$Ct.H.somni),ntree=1000,mtry=16)
fit$confusion
## 0 1 class.error
## 0 24 2 0.07692308
## 1 6 12 0.33333333
varImpPlot(fit)
fit$importance[order(fit$importance[, 1], decreasing = TRUE), ][1:5]
plot(fit$err.rate[, 1], type = "l", xlab = "number of trees", ylab = "error OOB")
EN conditionStudy only on EN condition samples
X_EN<-X[grep("EN",colnames(abundances)),]
X_LBA<-X[grep("LBA",rownames(Y)),]
Y_EN<-Y[grep("EN",colnames(abundances)),]
Y_LBA<-Y[grep("LBA",rownames(Y)),]
clean_log_EN <- clean_log[grep("EN",rownames(clean_log)),]
clean_id_EN <- clean_id[grep("EN",names(clean_id))]
clean_log_LBA <- clean_log[grep("LBA",rownames(clean_log)),]
clean_id_LBA <- clean_id[grep("LBA",names(clean_id))]
RSV - EN condition:fit <- randomForest(Y_EN$Ct.RSV ~ ., data = X_EN,classwt=table(Y_EN$Ct.H.somni), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 21 0 0
## 1 0 0 NaN
PI.3 - EN condition:fit <- randomForest(Y_EN$Ct.PI.3 ~ ., data = X_EN,classwt=table(Y_EN$Ct.H.somni), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 20 0 0
## 1 2 0 1
M.haemolytica - EN condition:fit <- randomForest(Y_EN$Ct.M.haemolytica~ ., data = X_EN,classwt=table(Y_EN$Ct.H.somni), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 12 2 0.1428571
## 1 8 0 1.0000000
M.bovis - EN condition:fit <- randomForest(Y_EN$Ct.M.bovis~ ., data = X_EN,classwt=table(Y_EN$Ct.H.somni), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 20 0 0
## 1 2 0 1
P.multocida - EN condition:fit <- randomForest(Y_EN$Ct.P.multocida ~ ., data = X_EN,classwt=table(Y_EN$Ct.H.somni), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 0 2 1
## 1 0 20 0
H.somni - EN condition:fit <- randomForest(Y_EN$Ct.H.somni ~ ., data = X_EN,classwt=table(Y_EN$Ct.H.somni), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 12 1 0.07692308
## 1 4 5 0.44444444
RSVset.seed(11)
res_plsda <- plsda(X=clean_log,Y=Y$Ct.RSV, ncomp = nlevels(clean_condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (RSV)')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log,Y=Y$Ct.RSV,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
## Splitting the variation for 1 level factor.
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 13 14
res_splsda <- splsda(X=clean_log,Y=Y$Ct.RSV,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
keepX = sel_keepX)
## Splitting the variation for 1 level factor.
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (RSV)')
test <- predict(res_splsda,clean_log)
pred <- test$class$mahalanobis.dist[,2]
get.confusion_matrix(Y$Ct.RSV,predicted=pred)
## predicted.as.0 predicted.as.1
## 0 36 1
## 1 6 1
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Lactococcus lactis" "Saccharibacteria"
## [3] "Moheibacter" "Streptococcus gallolyticus"
## [5] "Facklamia" "Flavobacterium"
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
PI.3set.seed(11)
res_plsda <- plsda(X=clean_log,Y=Y$Ct.PI.3, ncomp = nlevels(clean_condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 3
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 3
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (PI.3)')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log,Y=Y$Ct.PI.3,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
ERROR: Splitting the variation for 1 level factor. The SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not convergeThe SGCCA algorithm did not converge
Error in colnames<-(*tmp*, value = c(“0”, “1”)) : length of ‘dimnames’ [2] not equal to array extent
res_splsda <- splsda(X=clean_log,Y=Y$Ct.PI.3,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
keepX = sel_keepX)
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (PI.3)')
head(selectVar(res_splsda, comp = 1)$name)
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
P.multocidaset.seed(11)
res_plsda <- plsda(X=clean_log,Y=Y$Ct.P.multocida, ncomp = nlevels(clean_condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (P.multocida)')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log,Y=Y$Ct.P.multocida,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
## Splitting the variation for 1 level factor.
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 1 16
res_splsda <- splsda(X=clean_log,Y=Y$Ct.P.multocida,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
keepX = sel_keepX)
## Splitting the variation for 1 level factor.
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (P.multocida)')
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Saccharomonospora glauca K62"
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
M.haemolyticaset.seed(11)
res_plsda <- plsda(X=clean_log,Y=Y$Ct.M.haemolytica, ncomp = nlevels(clean_condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (M.haemolytica)')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log,Y=Y$Ct.M.haemolytica,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
## Splitting the variation for 1 level factor.
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 18 13
res_splsda <- splsda(X=clean_log,Y=Y$Ct.M.haemolytica,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
keepX = sel_keepX)
## Splitting the variation for 1 level factor.
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (M.haemolytica)')
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Eubacterium rectale" "Propionibacteriaceae"
## [3] "Bacteroidales S24-7 group" "Bacillus sp."
## [5] "[Ruminococcus] gauvreauii group" "Prevotellaceae"
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
M.bovisset.seed(11)
res_plsda <- plsda(X=clean_log,Y=Y$Ct.M.bovis, ncomp = nlevels(clean_condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 4
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 4
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (M.bovis)')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log,Y=Y$Ct.M.bovis,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
## Splitting the variation for 1 level factor.
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 12 7
res_splsda <- splsda(X=clean_log,Y=Y$Ct.M.bovis,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
keepX = sel_keepX)
## Splitting the variation for 1 level factor.
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (M.bovis)')
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Mycoplasma bovis" "Eubacteriaceae bacterium BL-152"
## [3] "Atopobium fossor" "Filifactor alocis ATCC 35896"
## [5] "Bacteroides pyogenes" "Bergeyella"
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
H.somniset.seed(11)
res_plsda <- plsda(X=clean_log,Y=Y$Ct.H.somni, ncomp = nlevels(clean_condition))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (H.somni)')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log,Y=Y$Ct.H.somni,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
## Splitting the variation for 1 level factor.
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 2 5
res_splsda <- splsda(X=clean_log,Y=Y$Ct.H.somni,
ncomp = nlevels(clean_condition),
multilevel = clean_id,
keepX = sel_keepX)
## Splitting the variation for 1 level factor.
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (H.somni)')
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Streptococcus" "Actinoalloteichus cyanogriseus"
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
EN conditionpls <- pls(as.matrix(sapply(clean_log_EN,as.numeric)), as.matrix(sapply(Y_EN,as.numeric)), ncomp = 2, mode = "regression")
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
spls <- spls(as.matrix(sapply(clean_log_EN,as.numeric)), as.matrix(sapply(Y_EN,as.numeric)), ncomp =2,
keepX= c(15,15), keepY=c(7,7), mode = "regression")
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
tune.pls <- perf(pls, validation = "Mfold", folds = 20,
progressBar = FALSE, nrepeat = 10)
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
tune.spls <- perf(spls, validation = "Mfold", folds = 20,
progressBar = FALSE, nrepeat = 10)
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
tune.pls$Q2.total
## Q2.total
## 1 comp -0.05644524
## 2 comp -0.10451848
plot(tune.pls$Q2.total)
abline(h=0.0975)
plotIndiv(spls, comp = 1:2, rep.space= 'Y-variate', ind.names = clean_id[grep("EN",names(clean_id))],
legend = TRUE, title = 'sPLS comp 1 - 2, Y-space')
plotIndiv(spls, comp = 1:2, rep.space= 'X-variate', ind.names = clean_id[grep("EN",names(clean_id))],
legend = TRUE, title = 'sPLS comp 1 - 2, X-space')
plotIndiv(spls, comp = 1:2, rep.space= 'XY-variate', ind.names = clean_id[grep("EN",names(clean_id))],
legend = TRUE, title = 'sPLS comp 1 - 2, XY-space')
Individual plots can be displayed on three different subspaces spanned either by the X variable, the Y variable or the mean subspace in which coordinates are averaged from the first two subspaces (XY).
plotVar(spls, comp =1:2, var.names = list(X.label = species,
Y.label = TRUE), cex = c(4, 5))
LBA conditionpls <- pls(as.matrix(sapply(clean_log_LBA,as.numeric)), as.matrix(sapply(Y_LBA,as.numeric)), ncomp = 8, mode = "regression")
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
spls <- spls(as.matrix(sapply(clean_log_LBA,as.numeric)), as.matrix(sapply(Y_LBA,as.numeric)), ncomp =8,
keepX= c(15,15), keepY=c(7,7), mode = "regression")
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
tune.pls <- perf(pls, validation = "Mfold", folds = 20,
progressBar = FALSE, nrepeat = 10)
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
tune.spls <- perf(spls, validation = "Mfold", folds = 20,
progressBar = FALSE, nrepeat = 10)
## Warning: The SGCCA algorithm did not converge
## Warning: l'écart type est nulle
## Warning: The SGCCA algorithm did not converge
## Warning: The SGCCA algorithm did not converge
## Warning: The SGCCA algorithm did not converge
tune.pls$Q2.total
## Q2.total
## 1 comp -0.021153712
## 2 comp -0.013099148
## 3 comp -0.018266768
## 4 comp -0.003367460
## 5 comp 0.020938526
## 6 comp -0.053575966
## 7 comp 0.103777737
## 8 comp 0.009090455
plot(tune.pls$Q2.total)
abline(h=0.0975)
plotIndiv(spls, comp = 1:2, rep.space= 'Y-variate', ind.names = clean_id_LBA,
legend = F, title = 'sPLS comp 1 - 2, Y-space')
plotIndiv(spls, comp = 1:2, rep.space= 'X-variate', ind.names = clean_id_LBA,
legend = F, title = 'sPLS comp 1 - 2, X-space')
plotIndiv(spls, comp = 1:2, rep.space= 'XY-variate', ind.names = clean_id_LBA,
legend = F, title = 'sPLS comp 1 - 2, XY-space')
Individual plots can be displayed on three different subspaces spanned either by the X variable, the Y variable or the mean subspace in which coordinates are averaged from the first two subspaces (XY).
plotVar(spls, comp =1:2, var.names = list(X.label = species,
Y.label = TRUE), cex = c(4, 5))
# cim(spls, comp = 1, xlab = "virus", ylab = "genes")
# selectVar(spls, comp = 1)$X$name
# test <- cbind(Y,clean_log[,which(species%in%selectVar(spls, comp = 1)$X$name)])
# cim(as.matrix(sapply(clean_log[,which(species%in%selectVar(spls, comp = 1)$X$name)],as.numeric)))
Regroupment of RSV, PI.3, Coronavirus
group_virus<-sapply(Y,as.numeric)
group_virus<-group_virus-1
group_virus<-group_virus[,1]+group_virus[,2]+group_virus[,3]
group_virus[group_virus>0]<-1
group_virus <- as.factor(group_virus)
group_virus_EN <- group_virus[grep("EN",rownames(Y))]
group_virus_LBA <- group_virus[grep("LBA",rownames(Y))]
group_virus - EN conditionset.seed(11)
res_plsda <- plsda(X=clean_log_EN,Y=group_virus_EN, ncomp = nlevels(group_virus_EN))
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (RSV,PI.3 et Coronavirus) - EN')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log_EN,Y=group_virus_EN,
ncomp = nlevels(group_virus_EN),
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 9
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 9
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 18 12
res_splsda <- splsda(X=clean_log_EN,Y=group_virus_EN,
ncomp = nlevels(group_virus_EN),
keepX = sel_keepX)
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (RSV,PI.3 and Coronavirus) - EN')
test <- predict(res_splsda,clean_log_EN)
pred <- test$class$mahalanobis.dist[,2]
get.confusion_matrix(group_virus_EN,predicted=pred)
## predicted.as.0 predicted.as.1
## 0 9 0
## 1 0 13
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Leucobacter" "Corynebacterium confusum"
## [3] "Pseudochrobactrum" "Gelidibacter"
## [5] "Streptococcus gallolyticus" "Moraxella lacunata"
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
# cim(res_splsda,comp=1)
# test<-clean_log_EN[,which(species%in%selectVar(res_splsda, comp = 1)$name)]
# cim(sapply(test,as.numeric),row.names = clean_id_EN)
Presence vs Absence of viruses: Student tests
test0 <- clean_log[group_virus_EN=="0",which(selectVar(res_splsda, comp = 1)$name[1]==species)]
test1 <- clean_log[group_virus_EN=="1",which(selectVar(res_splsda, comp = 1)$name[1]==species)]
t.test(test0,test1)
##
## Welch Two Sample t-test
##
## data: test0 and test1
## t = -1.3017, df = 39.609, p-value = 0.2005
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.8458953 0.3998936
## sample estimates:
## mean of x mean of y
## 1.169070 1.892071
We do not reject \(H_0\) (the difference in means is equal to 0) hence we can say that the presence or not of these three viruses does not change the values of the Leucobacter bateria.
boxplot(test0,test1,names=c("0","1"),
main=selectVar(res_splsda, comp = 1)$name[1],col=c("dodgerblue2","darkorange"))
test0 <- clean_log[group_virus_EN=="0",which(selectVar(res_splsda, comp = 1)$name[2]==species)]
test1 <- clean_log[group_virus_EN=="1",which(selectVar(res_splsda, comp = 1)$name[2]==species)]
t.test(test0,test1)
##
## Welch Two Sample t-test
##
## data: test0 and test1
## t = 2.115, df = 24.322, p-value = 0.04486
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0267483 2.1261777
## sample estimates:
## mean of x mean of y
## 1.5719196 0.4954565
We reject \(H_0\) (the difference in means is equal to 0) hence we can say that the presence or not of these three viruses changes the values of the Corynebacterium confusum bateria. But the p-value is very close to 0.05…
boxplot(test0,test1,names=c("0","1"),
main=selectVar(res_splsda, comp = 1)$name[2],col=c("dodgerblue2","darkorange"))
#cim(res_splsda, row.sideColors = color.mixo(group_virus_EN),row.names = clean_id_EN)
group_virus - LBA conditionset.seed(11)
res_plsda <- plsda(X=clean_log_LBA,Y=group_virus_LBA, ncomp = nlevels(group_virus_LBA))
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (RSV,PI.3 et Coronavirus) - LBA')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log_LBA,Y=group_virus_LBA,
ncomp = nlevels(group_virus_LBA),
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 8
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 8
## Warning: The SGCCA algorithm did not converge
## Warning: The SGCCA algorithm did not converge
## Warning: The SGCCA algorithm did not converge
## Warning: The SGCCA algorithm did not converge
## Warning: The SGCCA algorithm did not converge
## Warning: The SGCCA algorithm did not converge
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
## comp1 comp2
## 17 11
res_splsda <- splsda(X=clean_log_LBA,Y=group_virus_LBA,
ncomp = nlevels(group_virus_LBA),
keepX = sel_keepX)
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (RSV,PI.3 et Coronavirus) - LBA')
test <- predict(res_splsda,clean_log_LBA)
pred <- test$class$mahalanobis.dist[,2]
get.confusion_matrix(group_virus_LBA,predicted=pred)
## predicted.as.0 predicted.as.1
## 0 7 1
## 1 0 14
table(group_virus_LBA)
## group_virus_LBA
## 0 1
## 8 14
head(selectVar(res_splsda, comp = 1)$name)
## [1] "Paracoccus alcaliphilus" "Psychrobacter pulmonis"
## [3] "Brachybacterium sp." "Leptotrichiaceae"
## [5] "Jeotgalicoccus psychrophilus" "Delftia sp."
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
Presence vs Absence of viruses: Student tests
test0 <- clean_log[group_virus_LBA=="0",which(selectVar(res_splsda, comp = 1)$name[1]==species)]
test1 <- clean_log[group_virus_LBA=="1",which(selectVar(res_splsda, comp = 1)$name[1]==species)]
t.test(test0,test1)
##
## Welch Two Sample t-test
##
## data: test0 and test1
## t = 0.44295, df = 33.131, p-value = 0.6607
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9521709 1.4822678
## sample estimates:
## mean of x mean of y
## 1.760469 1.495421
We do not reject \(H_0\) (the difference in means is equal to 0) hence we can say that the presence or not of these three viruses does not change the values of the Psychrobacter pulmonis bateria.
boxplot(test0,test1,names=c("0","1"),
main=selectVar(res_splsda, comp = 1)$name[1],col=c("dodgerblue2","darkorange"))
test0 <- clean_log[group_virus_LBA=="0",which(selectVar(res_splsda, comp = 1)$name[2]==species)]
test1 <- clean_log[group_virus_LBA=="1",which(selectVar(res_splsda, comp = 1)$name[2]==species)]
t.test(test0,test1)
##
## Welch Two Sample t-test
##
## data: test0 and test1
## t = 2.2409, df = 19.028, p-value = 0.03716
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.113116 3.310445
## sample estimates:
## mean of x mean of y
## 2.6367155 0.9249348
We reject \(H_0\) (the difference in means is equal to 0) hence we can say that the presence or not of these three viruses changes the values of the Psychrobacter pulmonis bateria. But the p-value is close to 0.05…
boxplot(test0,test1,names=c("0","1"),
main=selectVar(res_splsda, comp = 1)$name[2],col=c("dodgerblue2","darkorange"))
#cim(res_splsda, row.sideColors = color.mixo(group_virus_LBA),row.names = clean_id_LBA)
PI.3 - EN conditionset.seed(11)
res_plsda <- plsda(X=clean_log_EN,Y=Y_EN$Ct.PI.3, ncomp = nlevels(Y_EN$Ct.PI.3))
## Warning in cor(A[[k]], variates.A[[k]]): l'écart type est nulle
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 2
## Warning in MCVfold.splsda(X, Y, multilevel = multilevel, validation = validation, : At least one class is not represented in one fold, which may unbalance the error rate.
## Consider a number of folds lower than the minimum in table(Y): 2
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (PI.3) - EN')
set.seed(33)
res_plsda <- tune.splsda(X=clean_log_EN,Y=Y_EN$Ct.PI.3,
ncomp = nlevels(Y_EN$Ct.PI.3),
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
Error: At least one class is not represented in one fold, which may unbalance the error rate.
res_splsda <- splsda(X=clean_log_EN,Y=Y_EN$Ct.PI.3,
ncomp = nlevels(Y_EN$Ct.PI.3),
keepX = sel_keepX)
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (PI.3) - EN')
head(selectVar(res_splsda, comp = 1)$name)
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
PI.3 - LBA conditionset.seed(11)
res_plsda <- plsda(X=clean_log_LBA,Y=Y_LBA$Ct.PI.3, ncomp = nlevels(Y_LBA$Ct.PI.3))
res_perf <- perf(res_plsda, validation = 'Mfold', folds = 5,
progressBar = FALSE, nrepeat = 20)
plot(res_perf, overlay = 'measure', sd = TRUE)
plotIndiv(res_plsda , comp = c(1, 2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'PLS-DA (PI.3) - LBA')
Error: there is unbalanced class of 1
set.seed(33)
res_plsda <- tune.splsda(X=clean_log_LBA,Y=Y_LBA$Ct.PI.3,
ncomp = nlevels(Y_LBA$Ct.PI.3),
test.keepX = 1:20, validation = 'Mfold',
folds = 10, dist = 'mahalanobis.dist', nrepeat = 10,
progressBar = F)
plot(res_plsda)
sel_keepX <- res_plsda$choice.keepX[1:2]
sel_keepX
Error: there is unbalanced class of 1
res_splsda <- splsda(X=clean_log_LBA,Y=Y_LBA$Ct.PI.3,
ncomp = nlevels(Y_LBA$Ct.PI.3),
keepX = sel_keepX)
plotIndiv(res_splsda, comp = c(1,2), ind.names = FALSE, ellipse = TRUE,
legend = TRUE, title = 'sPLS-DA (PI.3) - LBA')
head(selectVar(res_splsda, comp = 1)$name)
plotLoadings(res_splsda, comp = 1, method = 'mean', contrib = 'max',
size.title = 1)
plotLoadings(res_splsda, comp = 2, method = 'mean', contrib = 'max',
size.title = 1)
group_virus - EN condition:fit <- randomForest(group_virus_EN ~ ., data = X_EN,classwt=table(group_virus_EN), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 1 8 0.88888889
## 1 1 12 0.07692308
group_virus - LBA condition:fit <- randomForest(group_virus_LBA ~ ., data = X_LBA,classwt=table(group_virus_LBA), ntree=5000)
fit$confusion
## 0 1 class.error
## 0 0 8 1.0000000
## 1 2 12 0.1428571
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] randomForest_4.6-12 reshape2_1.4.2 metagenomeSeq_1.18.0
## [4] RColorBrewer_1.1-2 glmnet_2.0-13 foreach_1.4.3
## [7] Matrix_1.2-11 limma_3.32.10 Biobase_2.36.2
## [10] BiocGenerics_0.22.1 mixOmics_6.3.0 ggplot2_2.2.1
## [13] lattice_0.20-35 MASS_7.3-47
##
## loaded via a namespace (and not attached):
## [1] gtools_3.5.0 purrr_0.2.4 colorspace_1.2-4
## [4] htmltools_0.3.6 yaml_2.1.14 rlang_0.1.4
## [7] glue_1.2.0 bindrcpp_0.2 matrixStats_0.52.2
## [10] plyr_1.8.4 bindr_0.1 stringr_1.2.0
## [13] munsell_0.4.2 gtable_0.1.2 caTools_1.17.1
## [16] htmlwidgets_0.9 codetools_0.2-15 evaluate_0.10.1
## [19] labeling_0.3 knitr_1.17 httpuv_1.3.5
## [22] rARPACK_0.11-0 Rcpp_0.12.13 KernSmooth_2.23-15
## [25] xtable_1.8-2 corpcor_1.6.9 scales_0.5.0
## [28] backports_1.1.1 gdata_2.18.0 jsonlite_1.5
## [31] mime_0.5 RSpectra_0.12-0 gplots_3.0.1
## [34] gridExtra_2.3 ellipse_0.3-8 digest_0.6.12
## [37] stringi_1.1.5 dplyr_0.7.4 shiny_1.0.5
## [40] grid_3.4.1 rprojroot_1.2 bitops_1.0-6
## [43] tools_3.4.1 magrittr_1.5 rgl_0.98.1
## [46] lazyeval_0.2.1 tibble_1.3.4 tidyr_0.7.2
## [49] pkgconfig_2.0.1 assertthat_0.2.0 rmarkdown_1.7
## [52] iterators_1.0.8 R6_2.2.2 igraph_1.1.2
## [55] compiler_3.4.1